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A.  Summary  of  Progress  -  7/15/96  to  7/14/99 

Significant  progress  was  made  on  all  of  the  theoretical  issues.  However,  as 
mentioned  in  our  previous  progress  reports,  we  encountered  some  interesting 
problems  related  to  implementation  issues.  We  devised  a  number  of  approaches  to 
solve  these  problems,  and  this  was  the  subject  of  considerable  work  during  the  last 
year  of  the  project.  For  the  most  part,  the  difficulties  have  been  overcome.  An 
additional  major  theoretical  question  that  we  considered  centered  on  methods  for 
database  indexing  for  content  based  retrieval  and  geometric  hashing.  Approaches  to 
this  question  were  addressed  in  two  recent  papers,  but  further  improvements  in  the 
methods  and  algorithms  are  required.  Finally,  as  a  result  of  this  research,  we  are 
proposing  a  new  paradigm  for  geometric  feature  recognition  in  the  full  perspective 
case  that  shifts  the  focus  to  certain  toric  or  generalized  tone  subvarieties  in  projective 
space.  The  resulting  equations  should  be  more  global  in  character  and  less  sensitive 
to  instablity  resulting  from  geometric  degeneration  and/or  noise.  With  the  help  of  Dr. 
Greg  Arnold  at  the  Air  Force  Research  Laboratory  at  Wright  Patterson  Air  Force  Base, 
we  have  done  some  preliminary  work  that  indicates  this  approach  is  feasible  and  will 
be  considerably  more  robust  than  previous  methods.  We  hope  to  make  this  the 
subject  of  our  next  proposal  to  AFOSR. 

As  previously  reported,  we  have  completely  determined  the  object/image  equations 
for  configurations  of  point  and  line  features  in  the  perspective  case.  The  results 
appear  in  the  papers  cited  below.  We  also  focused  on  Grobner  basis  and  sparse 
resultant  techniques  (including  Dixon  and  KSY  resultants)  to  provide  a  symbolic 
computational  approach  to  generating  the  object/image  equations  for  various  sets  of 
object  features  and  other  camera/sensor  models.  These  symbolic  computational 
aspects  were  discussed  in  a  paper  presented  at  the  ISSAC  meeting  in  Hawaii. 
Grobner  bases  methods  proved  intractible  -  they  simply  didn't  work  on  computations 


of  the  size  we  were  dealing  with.  The  best  method  proved  to  be  a  modified  KSY 
resultant,  working  modulo  large  primes,  and  specializing  different  subsets  of  the 
variables  to  fixed  numericai  vaiues.  Once  the  degrees  of  the  variables  in  the 
object/image  equations  were  determined,  we  could  interpolate  to  get  the  actual 
equations  by  taking  lots  of  object/image  pairs.  So  far  that  hasn't  been  necessary,  as 
the  modified  KSY  answer  has  proved  to  be  the  right  one  in  characteristic  zero. 
Computational  swell  is  still  a  problem  however.  For  example,  at  an  intermediate 
stage  in  the  computation  of  the  object/image  equation  for  the  recognition  of 
configurations  of  6  line  segments,  the  computation  occupied  53MB.  However,  the 
final  answer  collaspes  to  a  relatively  small  polynomial. 

We  also  made  some  progress  in  our  effort  to  explore  the  feasibility  and  robustness 
of  several  algorithms  based  on  our  results  (making  use  of  the  geometric  invariant 
theory  and  computational  algebraic  geometry).  These  algorithms  have  been  used  to 
index  various  sorts  of  databases  for  content-based  retrieval.  The  computation  of  the 
object/image  equations  is  of  course  a  pre-compute,  once  found  for  a  given  feature  set 
they  are  easily  evaluated  in  real-time  to  test  for  object/image  matching.  The  real 
challenge  is  to  do  geometric  hashing  using  the  resulting  equations  when  a  very  large 
database  is  involved. 

We  did  develop  a  multi-dimensional  access  scheme  based  on  a  somewhat 
complicated  hashing  technique  that  works  mildly  well.  For  large  databases  (e.g. 
10,000  items)  we  can  do  recognition  without  accessing  more  than  about  15%  of  the 
items  in  worst  cases.  Moreover  the  method  seems  to  get  more  efficient  as  the  size  of 
the  database  increases.  A  more  advanced  method  based  on  a  sophisticated 
polyhedral  subdivision  scheme  is  under  investigation. 

We  also  have  a  demo  we  built  in  JAVA  that  will  recognize  aircraft  types  in  photos. 
We  have  a  3D  database  made  using  line  drawing  from  Janes.  The  user  selects 
obvious  key  points,  like  the  nose,  wing  tips,  etc.  in  the  photo  and  our  algorithms 
evaluate  the  object/image  equations  (using  the  3D  and  2D  invariants)  to  come  up  with 
a  good  candidate  aircraft  for  the  one  in  the  photo.  The  method  is  completely  view 
independent.  The  Sarnoff  Corporation  in  Princeton,  New  Jersey  has  an  early  version 
of  this  demo  which  uses  our  approach.  It  has  been  of  interest  to  the  DOD  Intellignece 
community  as  an  aide  to  photo  interpreters.  Finally,  we  have  become  interested  in  the 
potential  of  our  methods  for  video  indexing  where  dynamic  motion  is  involved. 

Most  recently  we  have  begun  a  collaboration  with  Air  Force  researchers  at  Wright 
Patterson.  They  are  interested  in  applying  our  techniques  to  sequences  of  images 
and  to  other  types  of  imagery,  most  notably  SAR  images.  Our  contacts  at  Wright 
Patterson  at  Drs.  Greg  Arnold  and  Vince  Velten  of  AFRUSNAT  (Target  Recognition 
Branch).  We  hope  to  submit  a  proposal  based  on  this  collaboration  to  AFOSR  in 
the  near  future. 

Also,  we  were  put  in  contact  with  engineers  at  Vexcel  Corp.  in  Boulder,  CO  by 
Dr.  Nachman  of  AFOSR.  We  visited  Vexcel  in  summer  1999  to  discuss  our  results. 


b.  publications 


We  have  eight  papers  that  have  been  accepted  for  publication  and  others  in 
preparation,  inducing  a  joint  paper  with  Lewis  and  Nakos  that  will  likely  appear  in  the 
Journal  of  Symbolic  Computation.  Copies  of  the  two  newest  published  works  are 
attached.  Copies  of  the  others  were  attached  to  our  previous  reports. 

1.  Stiller,  P.  F.,  Asmuth,  C.  A.,  and  Wan,  C.  S.,  "Single  View  Recognition  -  The 
Perspective  Case,"  Proceeding  SPIE  Int'l  Conf.,  Vision  Geometry  V,  Vol.  2826, 

Denver,  CO,  8/96,  pp.  226-235  (1996). 

2.  Stiller,  P.  F.,  "General  approaches  to  recognizing  geometric  configurations  from  a 
single  view,"  Proceedings  SPIE  Int'l  Conf,  Vision  Geometry  VI,  Vol.  3168,  San  Diego, 
CA,  7/97,  pp.  262-273  (1997). 
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Abstract 

The  “Six-line  Problem”  arises  in  computer  vision  and  in  the  automated  analysis  of  images.  Given  a  three-dimensional  (3D) 
obto  tarn*  (tor  .topi,  »,  lto>  to  vi,  tomq.es  tarn  elgebrmc  geomeby  » > 

invariant  theory  produces  a  set  of  3D  invariants  that  represents  that  feature  set.  Suppose  that  later  an  object  is  encountered  in 
an  image  (for  MtUnple,  a  photograph  taken  by  a  camera  modeled  by  standard  perspective  projection,  i.e.  a  pinhole  camera), 
and  suppose  further  that  six  lines  are  extracted  from  the  object  appearing  in  the  image.  The  problem  is  to  decide  ^  object  m 
the  image  is  the  original  3D  object.  To  answer  this  question  two-dimensional  (2D)  invariants  are  computed  from  *elm 
the  image.  One  can  show  that  conditions  for  geometric  consistency  between  the  3D  object  features  and  thelDimagefea  es 
can  be  Expressed  as  a  set  of  polynomial  equations  in  the  combined  set  of  two-  and  three-dimensional  invanants.  The  object  n 
the  image  is  geometrically  consistent  with  the  original  object  if  the  set  of  equations  has  a  solution.  One  well  known  method  to 
attack  such  sets  of  equations  is  with  resultants.  Unfortunately,  the  size  and  complexity  of  this  problem  made  it  appear 
overwhelming  until  recently.  This  paper  will  describe  a  solution  obtained  using  our  own  variant  of  the  Cayley-Dixon  Kapur- 
Saxena-Yang  resultant.  There  is, reason  to  believe  that  the  resultant  technique  we  employ  here  may  solve  other  complex 
polynomial  systems.  ©  1999  IMACS/Elsevier  Science  B.V.  All  rights  reserved. 

Keywords:  Dixon  resultant;  Fermat  program;  Six-Line  Problem 


1.  Introduction 

The  recognition  problem  for  six  lines  (Six-Line  Problem)  arises  in  computer  vision  and  in  the 
automated  recognition  of  three-dimensional  objects.  From  an  object,  six  lines  are  exuracted,  and  from 
those  six  lines,  nine  three-dimensional  (3D)  invariants  are  computed  as  a  kind  of  signature.  Later,  a 
two-dimensional  “snapshot”  of  some  possibly  different  object  is  obtained  from  an  arbitrary 


‘Expanded^emontrtalks  presented  at  the  Maui  IMACS  meeting,  July  1997,  and  the  Prague  IMACS  meeting,  August 
1998. 

2Partially  supported  by  the  Office  of  Naval  Research. 

3Partially  supported  by  the  Air  Force  Office  of  Scientific  Research. 

0378-4754/99/520.00  ©  1999  IMACS/Elsevier  Science  B.V.  All  rights  reserved. 
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perspective,  and  from  this  snapshot  six  lines  are  extracted  leading  to  the  computation  of  four  two- 
dimensional  (2D)  invariants.  The  question  is:  Is  the  snapshot  a  picture  of  the  original  object,  i.e.  a 
perspective  projection  of  the  original  six  lines?  We  desire  a  method  that  can  rapidly  and  reliably  decide 
if  a  given  set  of  2D  data  represents  the  same  3D  object,  or  at  least  that  a  given  2D  set  cannot  represent 

that  object.  .  . 

Using  algebraic  geometry.  Stiller  [5]  showed  that  there  should  be  a  single  equation  relating  the  nine 

3D  invariants  to  the  four  2D  invariants.  He  reduced  the  problem  to  a  system  of  four  equations  in  16 
variables  involving  three  additional  variables  (actually  four,  but  one  may  be  set  to  1).  The  resulting  four 
polynomial  equations  dt  =  0,/  =  1, . . .  ,4  in  the  three  new  variables  are  quadratic  and  involve  the 
9+4=13  invariants  as  parameters  in  the  coefficients.  The  image  is  consistent  with  the  original  object  if 
and  only  if  the  four  equations  have  a  solution  in  the  three  variables  (subject  to  a  mild  nondegeneracy 
constraint)4  Note  that  we  do  not  need  to  know  what  the  values  of  the  three  auxiliary  variables  actually 
are,  only  that  a  solution  exists.  Image  recognition  questions  of  this  general  type,  but  for  points,  were 
considered  by  Quan  [4]  and  Stiller  et  al.  [6]. 

The  solution  of  systems  of  polynomial  equations  is  important  in  many  fields  of  applied  mathematics. 
One  of  the  classic  methods  of  solving  such  systems  is  with  resultants.  In  general  a  resultant  is  a  single 
polynomial  derived  from  a  system  of  polynomial  equations  that  encapsulates  the  solution  (common 
zeroes)  of  the  system.  The  Sylvester  Determinant  is  the  best  known  method  of  computing  a  resultant. 
However,  it  is  not  a  realistic  tool  for  solving  equations  of  more  than  one  variable.  Other  methods  exist, 
which  usually  compute  not  the  resultant  itself  but  rather  a  multiple  of  it,  containing  extraneous  factors. 
The  standard  Macaulay  resultant  yields  no  information  for  our  problem  since  both  die  numerator  and 
denominator  determinants  are  identically  zero.  Another  resultant  method  is  that  of  Dixon  (generalizing 
Cayley),  recently  extended  by  Kapur  et  al.  [2].  The  authors  of  that  paper  show  that  their  method  must 
work  if  a  certain  condition  holds.  The  condition  is  rather  strong,  and  in  our  case  it  is  not  satisfied.  Yet 
we  are  able  to  make  the  method  work  anyway.  This  suggests  to  us  that  more  theoretical  work  should  be 
done  on  the  Dixon-Kapur-Saxena-Yang  approach,  and  that  probably  our  approach  here  will  succeed 
for  many  problems  of  interest. 


2.  The  basic  geometric  approach 

The  moduli  space  of  equivalence  classes  of  (semi-stable)  six-tuples  of  lines  in  P3,  projective  3-space, 
under  the  action  of  projective  transformations  (the  matrix  group  PGL4, 4x4  matrices  modulo  scalars)  is 
a  rational  variety  of  dimension  9.  We  can  thus  expect  to  find  9  functions  of  the  parameters  defining  the 
lines  which  are  invariant,  in  the  sense  that  they  provide  coordinates  on  a  Zariski  open  set  of  the  moduli 
space.  We  explain  briefly  how  this  is  done.  It  is  sufficient  to  work  in  a  Zariski  open  subset  of  the  set  of 
6-tuples  of  lines,  so  we  will  not  hesitate  to  impose  various  general  position  assumptions  that  will 
become  apparent  below. 

Let  t\,  £2.  £3,  £4,  £5.  and  £ 6  be  six  lines  in  space.  We  assume  t\,  £2,  and  £3  are  mutually  skew  (our  first 
general  position  assumption).  Without  loss  of  generality,  we  can  complexify  and  work  in  complex 
projective  space  P3.  Since  lines  in  P3  are  parameterized  by  the  4-dimensional  (complex) 

4We  do  not  assume  homogeneity.  Thus,  we  expect  n+1  equations  in  n  variables  to,  in  general,  not  be  solvable.  The  resultant 
places  a  constraint  on  the  13  coefficient  parameters  that  characterizes  solvability. 
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Grassmannian,  G( 2,4),  of  two-planes  through  the  origin  in  (complex)  four-space,  an  ordered  six-tuple 
(£1, . . .  ,£(,)  of  lines  can  be  viewed  as  a  point  in  the  24-dimensional  manifold  X  =  G( 2,4)  x  •  •  •  x 
G(2,4).  The  group  PGL4  of  projective  linear  transformations  acts  on  P3  sending  lines  to  lines  and 
hence  acts  on  X  sending  a  6-tuple  of  lines  to  another  6-tuple.  We  are  interested  in  the  quotient 
X  =  X/PGU  of  X  by  this  action.  Since  PGL4  is  15-dimensional,  we  expect  X  to  have  dimension  9.  For 
various  technical  reasons  (in  fact  to  get  a  good  quotient  space)  we  must  limit  ourselves  to  an  open  dense 
subset,  in  fact  a  Zariski  open  subset,  U  of  X,  and  construct  the  quotient  U  =  JJ/PGLa.  For  example,  the 
requirement  that  £u  £2,  and  £3  be  mutually  skew  is  one  of  the  conditions  defining  U. 

Now  lines  in  projective  space  correspond  to  planes  through  the  origin  in  4-space,  and  two  skew  lines 
correspond  to  two  planes  that  intersect  only  in  the  origin.  We  can  therefore  move  £,  to  the  z,  w-plane 
and  t2  to  the  x,  y-plane  by  a  4x4  invertible  matrix.  In  this  position,  £x  corresponds  to  the  z-axis  in  space 
and  £2  corresponds  to  a  line  at  infinity  that  meets  both  the  x-  and  y-axes.  Specifically  the  points 
(0 : 0:  1:  0)  and  (0 : 0:  0:  1)  will  be  on  £x  and  likewise,  £2  will  contain  the  points  (1  :  0:  0:  0)  and 
(0:  1:  0:  0). 

Having  moved  £\  and  £2  to  the  above  “canonical”  positions,  the  4x4  invertible  matrices  that  fix  these 
two  lines  have  the  form: . 


( a  b  \  0  0^ 

c  d  \  0  0 

0  0  :  e  f 

\0  0  :  g  h) 


(1) 


with  ad  -  be  ^  0  and  eh  -  gh^O. 

Now  £2  is  assumed  to  be  skew  to  both  £x  and  £2.  Suppose  :  n, :  rx :  j,)  and  ( m2  :n2:r2:  s2)  are 
two  distinct  points  on  £3,  which  is  then  the  line  a  (mi  :  n}  :  rx  :  si)  rf  /3(m2  :  n2  :  r2  :  s2)  =  (ami  + 
0m2  :  ati\  +  @n2  :  ar\  +  0r2  :  a^i  +  fis2)  a s  (a:  (3)  runs  through  all  points  in  P1.  If  £2  were  to  meet 

£1,  we  would  have  ami  +  0m2  =  0  and  anx  +  0n2  =  0  for  sonie  non-trivial  (a,3).  This  can  happen  if 

and  only  if  det  (  m‘  mi  )  =  0.  Thus  £3  being  skew  to  £\  means  det  (  m'  m2  )  ^  0.  Likewise  £3 

\n\  n2  J  n2  J  ~  3 

skew  to  £?  means  det  |  r‘  rj  )  ^  0. 

V^i  $2  J  ' 

We  can  choose: 

(a  b\  =  (mi  mA~l 

\c  d)  n2  J 

and 


so  that  the  4x4  matrix  (1)  above  moves  £3  to  the  line  through  (1  :  0  :  1  :  0)  and  (0  :  1  :  0  :  1)  without 
moving  ^  or  £2. 
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The  set  of  4x4  matrices  fixing  £u  £2,  and  £2  consists  of  all  matrices  of  the  form: 


(a  b  \ 

:  * 

c  d 

a  b 

*  : 

\  c  dj 

where  ad  -  be  ^  0.  In  other  words,  we  are  reduced  to  finding  invariants  for  an  action  of  PGLn  on  the 
remaining  three  lines. 

Assume  now  that  £4  is  skew  to  £\  and  goes  through  the  points  (mi  :  n  1  :  ?\  :  st)  and 
(m2  :  h2  :  r2  :  s2)  .  Our  group,  PGL 2,  which  fixes  £u  £2,  £2,  will  act  on  £4  as  follows: 


/a  b  \ 

:  * 

c  d 


\ 


a 


c 


b 

d) 


where  we  will  have  det 


f  fh\ 
\hx 


/mi  m2\ 


\h  h  J 


0  (because  £4  is  skew  to  £{).  Here  the  line  is  represented  by  a 


4x2  matrix  whose  columns  are  the  homogeneous  coordinates  of  two  points  on  the  line.  Now  without 

loss  of  generality,  we  can  assume 
The  action  yields: 


/mi  m2\  f  1  0\ 
\»i  ”2/  \°  l)' 


(  ^ 


which  is  a  new  line  £  going  through  the  two  points  given  by  the  columns  of  this  4x2  matrix. 
Choosing  two  different  points  on  £  amounts  to  postmultiplying  by  an  arbitrary  invertible  2x2  matrix. 

We  can  choose  ^  for  this  purpose.  This  means  that  £  can  be  given  by: 

(l  0  \ 

0  1 

V  *  ) 
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where  N  is  the  2x2  matrix: 


• 

In  other  words,  the  orbit  of  £  is  just  the  orbit  of  N  -  (  ”u  ”12  |  under  conjugation. 

(t  0\  ^"21  ”22' 

The  orbits  with  N  a  scalar  matrix,  N  =  (  Q  ),  are  just  points,  i.e.  they  are  fixed  points  of 

the  action.  The  nature  of  the  orbits  with  N  not  scalar  depends  on  the  Jordan  form  of  N.  The 
possibilities  are: 

Case  1 :  ^  q  *  ^ .  Here  the  orbit  is  2D  since  the  matrices  which  fix  ^  ^  under  conjugation  (i.e. 
commute  with  f  »  |  ^  are  of  the  form  ( ^  0. 


are  of  the  form 


Case  2:  ^q*  ^  ^  with  Ai  ^  A2.  Here  the  orbit  is  2D  since  the  matrices  which  fix  ^ 


under  conjugation  are  of  the  form  ^ ^  ^  J. 

We  will  assume  that  £4  is  in  case  2,  which  is  the  generic  case.  In  other  words,  we  will  assume  that 
(  r}  r}  )  has  distinct  (unequal)  eigenvalues.  Thus  we  can  move  £4  to  either  the  line  through 

\  5  j  / 

(1  :  0  :  Ai  :  0)  and  (0  :  1  :  0  :  A2)  or  the  line  through  (1  :  0  :  A2  :  0)  and  (0  :  1  :  0  :  Aj).  This  ambiguity 

arises  because  Jordan  form  in  this  case  is  not  unique!  It  can  be  either  ^  ^ 1  a2  )  °r  ( A|  ) '  ^ow 

fix  £4  to  be  one  of  these  two  lines.  (It  does  not  matter  which;  moreover  we  will  never  in  practice  need  to 
make  a  choice  between  the  two.) 

The  transformations  that  fix  £1,  £2,  £3  and  £4  take  the  form: 


a  0 


modulo  scalar  matrices.  Thus  we  have  essentially  reduced  the  group  to  C*  x  C*/C*  =  C’  where  the  C* 
in  the  quotient  is  embedded  diagonally  in  C*xC*.  We  say  “essentially”,  because  there  is  still  a  Z2- 

action  lurking  that  switches  ^  ^  and  ^  ^ .  This  is  accounted  for  below. 

If  we  assume,  in  addition  to  £4,  that  £5  and  £6  are  skew  to  £1,  then  we  can  reinterpret  our  problem  as 
one  of  finding  invariants  for  the  action  of  PGL2  on  the  three-fold  product  of  2x2  matrices  by 


.  This  is  accounted  for  below. 
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conjugation  in  each  factor;  specifically: 

(N4,N5,N6)  —  (A^4A“i,A^5A-i,A^6A-1) 

for  A  an  invertible  2x2  matrix  representing  an  element  of  PGL2.  Here  i  =  4,5,6  is  the  line  passing 
through  the  points  in  P3  which  are  the  columns  of  the  4x2  matrix: 


A  known  set  of  invariants  are  the  traces  of  Nu  N2,  N3,  Nj,  N\,  N],  NXN2,  N\N2,  N2N2  and  N{N2N2 
which  have  one  relation  among  them.  We  take  a  different  approach.  Since  we  have  assumed  that  N4  has 

distinct  eigenvalues,  we  can  find  an  A  which  conjugates  N4  to  either  (  )  or  (  0  ). 

Consider  the  following  subgroup  G  of  PGL 2:  '  2 '  '  ' 

G={(0  ^>a7^0,  d^O  or  ^  b  ^  0,  c  ^  ojmod  scalars. 

Action  by  G  leaves  N4  in  diagonal  (Jordan)  form.  Thus  we  can  reduce  our  action  to  one  of  G  acting  on 
(CxC— A)xN5xN6  where  A  is  the  diagonal  in  CxC  and  where  we  identify  (A,,  A2),  A|  ^  \2,  in 

Cx€-A  with  (^' 

We  now  try  to  move  t5  to  a  canonical  position  using  just  the  C*  action  of  ^  ^  mod  scalars.  (This 

does  not  depend  on  our  choice  for  the  position  of  £4.)  If  we  assume  £s  is  skew  to  £,  so  that  it  can  be 
taken  to  go  through  the  points  (1 :  0  :  n\  \ :  n2\)  and  (0:1:  ri\2  :  n22 ),  then  the  group  acts  via: 


n\\  ti\2 
n2\  n22 


ClTl\  1  CUl\2 

dn2\  dn22 


which  is  the  same  line  as: 


*11  y»i2 
;”2i  n22 


We  will  assume  that  i5  is  sufficiently  generic  so  that  nl2  ^  0  and  n2\  ^  0.  We  can  then  normalize 
”2!  ”22  )  S°  ***  ”12  =  "2I  =  8  ^  by  choosing  {d/a)  =  yjn\2jn2 
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Note  that: 


{0  b\  f nii  «i2 \  / 0  b\  l_fn 22  f”21 

\c  Oj  \n2\  n22)  \c  OJ  Vf/112  nu  J' 

Thus  if  we  normalize  Ns  to  (  ”n  *  J,  g^O,  then  the  elements  in  the  subgroup  G  which 

\g  n22  J 

preserve  our  “normal  form”,  namely  that  N4  be  diagonal  and  that  N5  have  equal  off-diagonal  elements 
(non-zero),  form  a  subgroup  H: 


«={(o  :)-^°}u{(2  -o0)'«*o}u{(“  o).^°}u{c  o)'°*°} 


mod  scalars.  Clearly  H  <  PGLi  is  a  finite  group  isomorphic  to  Z2xZ2. 

We  are  therefore  reduced  to  the  action  of  this  finite  group  H  on  U  =  (C  x  C  -  A)  x  (C2  x  €*) 
xC4  C  C9  with  coordinates  (Ab  A 2,nu,  n22,  g,pu,  P 12,  P2UP22)  where  i6  is  assumed  skew  to  £1  so  that 
it  can  be  represented  by: 

fl  0  \ 

0  1' 

P 11  P 12 
\P2l  P22  / 

Note  that  UcC9  is  defined  by  g  ^  0  and  Aj  ^  A2,  i.e.  by  g(Ai  -  A2)  #  0.  Thus  U  is  an  affine  variety 
with  coordinate  ring: 


R  =  C 


Ai,  A2, 


t - 7-  ,nu,n22,g,~  ,pn,p\2,p2i,p22 

Ai  -  A2  g 


and  function  field: 


F  =  C(Ai,  A2,nn,n22,g,/?ii,Jpi2,p2i,p22). 

The  desired  quotient  variety  U/H  is  affine  with  coordinate  ring  given  by  the  invariants  RH  and  function 
field  given  by  the  fixed  field  FH.  One  can  show  that  this  variety  is  rational,  i.e.  FH  is  a  field  of  rational 
functions  in  nine  algebraically  independent  quantities  -  the  desired  invariants. 

To  generate  the  desired  equations  one  works  with  the  nine  “invariants”  Aj,  A2,  nn,  n22,  g,  pu,  px2, 
p2i,  and  P22  (modulo  the  action  of  H  =  Z/2Z  x  Z/2Z).  In  the  plane  one  will  have  four  standard 
invariants  q\,q2,  q2,  q4  which  are  rational  expressions  in  the  coefficients  of  the  six  lines  +  b,y  +  c, 
=  0  viewed  in  P2  as  the  points  (a, :  b, :  c,),  i=  1,...,6.  These  are  q\  =  qs.o/qsi,  qi  =  ?5.i/<?5.2, 
<?3  =  <?6,o/<?6,2,  <?4  =  96,1 /?6, 2  in  the  notation  of  [6]. 

Now  one  can  use  the  above  invariants,  and  the  description  of  the  relationship  between  six  lines  in  3D 
and  six  lines  in  2D  as  a  correspondence  (in  the  sense  of  algebraic  geometry),  to  produce  a  system  of 
four  equations  in  17=9+4+4  variables,  nine  3D  invariants,  four  2D  invariants,  and  four  variables 
which  represent  an  invertible  2x2  matrix: 

C^U  ^11^22  -  ^21^12  7^  0 

\a21  <322/ 
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acting  by  conjugation  as  above  on: 


The  key  result  is  that  it  can  be  shown  that  these  3D  configurations  for  fixed  Xu  A2,  . . P22  and 
variable  atJ  sweep  out  a  Zariski  open  set  of  the  3D  set  of  all  possible  2D  equivalence  classes  obtainable 
by  all  possible  perspective  projections.  The  resulting  four  equations  appear  in  Appendix  A.  Note  they 
are  linear  in  the  2D  invariants,  quadratic  in  the  3D  invariants  and  homogeneous  quartic  in  the  a^.  By 
eliminating  the  a,-,  one  arrives  at  the  desired  object/image  equation.  This  is  the  problem  we  take  up.  One 
complication  is  that  the  system  always  has  degenerate  solutions  a,y  where  fli  1^22  -  « 12^21  =0.  This  is 
what  causes  the  classical  Macaulay  resultant  to  fail. 

The  reader  may  wonder  about  the  fact  that  Ai,  A2,  . . .,  p22  are  not  quite  invariant  and  that  a  Z2xZ2 
action  still  lurks.  This  causes  no  serious  problem.  In  fact  a  test  of  the  final  single  resultant  equation 
relating  Aj, . . .,  p22,  qu  . . .,  qA  shows  it  to  be  invariant  under  this  action.  For  simplicity  we  stick  with 
these  “not  quite  invariant”  invariants. 


3.  The  basic  computational  approach 

We  have: 

•  Nine  3D  parameters:  A,,  A2,  nn,  n2 2,  g,  pu,  Pn,  Pi\,  P22- 

•  Four  2D  parameters:  qi,  q2,  <73,  <74. 

•  Three  (initially  four)  conversion  variables:  (an  =  1),  ai2,  a2i,  a22. 

•  Four  quartic  equations  (see  Appendix  A)  in  the  variables  atj  and  the  13  parameters. 

The  four  equations  have  the  useful  property  that  <7,  appears  only  in  equation  i,  and  only  with  degree  1. 
It  is  therefore  quite  easy  to  solve  for  each  <7,  in  terms  of  the  other  variables.  While  this  is  an  unnatural 
thing  to  do  from  the  standpoint  of  the  Six-Line  Problem,  we  will  exploit  it  later  to  check  answers. 

The  Cayley-Dixon  method  to  eliminate  the  three  variables  a-,-  may  be  summarized  as  follows  (see  [2] 
for  details): 

•  Adjoin  three  new  auxiliary  variables,  r,  s,  t. 

•  Create  the  Dixon  matrix,  DM.  Then  compute  the  Dixon  polynomial: 

dm  det  (DM) 

{r  -  ai2){s  -  a2i)(t  -  a22)' 

•  If  desired,  we  may  work  with  a  certain  “fixed  object,”  i.e.  a  set  of  numerical  3D  invariants.  Stiller 
provided  an  algorithm  for  creating  such  test  cases  of  3D  (and  corresponding  2D)  data  sets.  The  data 
are  integers  or  rational  numbers.  We  may  then  substitute  into  dm  some  or  all  of  the  nine  3D 
numerical  values.  This  reduces  the  size  and  complexity  of  dm. 

•  Create  the  second  Dixon  matrix  by  extracting  coefficients  from  dm  in  a  certain  way.  These 
coefficients  are  polynomials  in  the  four  2D  parameters  q i  =  1 , . . . ,  4  and  those  3D  parameters  that 
remain  from  the  previous  step.  It  is  a  105  x  105  matrix. 
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•  The  determinant  of  this  second  matrix  is  the  classical  Dixon  Resultant.  If  there  is  a  common  solution  of 
the  original  system  of  four  equations,  then  this  determinant  must  be  0.  Ideally  that  provides  an  equation 
that  must  be  satisfied  by  the  parameters.  However,  in  our  case  (and  in  many  others)  it  is  identically  0. 

But  that  is  not  the  end  of  the  story.  The  Kapur-Saxena-Yang  (KSY)  method  continues: 

•  Extract  the  non-zero  rows  and  columns  from  the  second  matrix.  This  leaves  a  51  x  56  matrix.  Call 
this  the  third  matrix. 

•  If  a  certain  condition  holds  on  the  third  matrix,  compute  the  determinant  of  any  maximal  rank 
submatrix.  These  polynomials  must  vanish  if  the  original  system  has  a  solution. 

In  other  words,  these  necessarily  nonzero  polynomials,  any  of  which  we  will  call  ksy,  play  the  role  of 
the  classical  Dixon  Resultant.  We  will  have  more  to  say  about  the  “certain  condition”  in  Section  5. 

4.  Phase  One  of  the  computation 

Unless  some  of  the  numerical  3D  parameters  from  a  “real”  object  are  substituted  into  dm  before  the 
creation  of  the  second  Dixon  matrix,  the  polynomials  ksy  will  be  hopelessly  large  for  any  existing 
computer  system.5  In  the  first  phase  of  the  project,  we  substituted  rational  values  for  all  nine  3D 
parameters,  thus  reducing  the  goal  to  computing  a  resultant  for  that  one  object  -  a  polynomial  in  the 
four  2D  invariants  q\,  q2,  q 3,  q4. 

•  Input  the  numerical  (rational  or  integral)  data  for  the  nine  3D  parameters.  For  example  Aj,  A2,  . . ., 
P22= 3,  4,  2,  3,  2,'  3,1,  2,  1/2. 

•  Compute  ksy,  a  polynomial  in  the  four  parameters  q\,  q2,  qz,  q4. 

•  To  determine  if  a  set  of  2D  data  qu  q2,  q2,  q4  “matches”  the  3D  object,  substitute  the  four  numerical 
values  into  ksy  and  see  if  the  result  is  0.  If  it  is  not,  the  2D  set  cannot  be  a  perspective  projection  of 
the  3D  object. 

An  important  simplification  results  by  reconsidering  what  is  meant  by  “the  result  is  0”.  Recall  that 
the  coefficients  of  the  polynomial  ksy  are  rational  numbers.  Since  we  seek  solutions  of  ksy ■  -0.  we  can 
clear  out  denominators  and  assume  that  all  coefficients  are  integers.  Rather  than  work  ovr,i  the  ring  of 
integers,  we  can  save  enormously  in  both  time  and  space  if  we  choose  a  moderately  large  (20,000- 
50,000)  prime  number  p  at  random  and  reduce  all  the  equations  modulo  this  prime.  We  are  then 
working  over  the  field  Zp,  and  it  is  sufficient  to  test  a  candidate  set  of  2D  parameters  jj,  s2,  S3,  s4  by 
reducing  them  modulo  p  and  checking  ksy{s\,s2, S3,s4)  =  0  in  Zp.  The  resulting  modular  algorithm  is 
probabilistic,  with  a  very  high  probability  of  success.  An  incorrect  set  of  parameters  will  not  pass  the 
modular  test  unless  ksy(slt  s2,  S3,  s4)  is  a  multiple  of  p.  The  probability  of  that  is  no  more  than  Up.  A 
correct  set  of  parameters  will  pass  it  unless  one  of  the  parameters  is  a  fraction  with  denominator  a 
multiple  of  p.  Since  p  will  be  around  20,000,  this  seems  extremely  unlikely,  and  in  any  event  would  be 
detected  when  the  rational  values  $i,  s2,  S3,  s4  are  converted  to  values  in  Zp.  The  probability  of  a  mistaken 
judgement  can  be  further  reduced  by  simply  doing  the  algorithm  twice  with  two  different  primes. 

Lewis  wrote  programs  to  create  the  third  matrix  and  compute  ksy  in  his  computer  algebra  system 
Fermat  [3].  One  method  is  to  compute  the  product  of  the  pivot  elements  that  come  up  as  one  normalizes 


Conservatively,  such  a  ksy  would  have  at  least  109  terms,  probably  more  like  1012. 
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(say,  into  the  Hermite  form)  the  third  matrix.  One  can  learn  the  rank  of  this  matrix  very  easily  by 
plugging  in  integers  at  random  for  the  four  q(  parameters  and  computing  a  matrix  normal  form.  The 
matrix  has  rank  26.  Therefore,  ksy  is  the  product  of  26  terms  that  will  appear  on  the  main  diagonal  as 
the  matrix  is  normalized.  Depending  on  the  algorithm,  these  terms  may  not  be  all  polynomials. 
Nevertheless,  the  product  of  all  26  will  be  a  polynomial. 

The  row  and  column  reductions  went  well,  up  to  the  17th  row/column.  Beyond  that  the  complexity  of 
the  computation  becomes  overwhelming.  However,  it  is  not  necessary  to  continue  the  normalization 
algorithm.  Recall  that  we  have  reason  to  think  that  any  maximal  rank  submatrix  will  do.  By  substituting 
random  integers  for  three  of  the  parameters  qt  it  is  easy  to  discover  a  26  x  26  maximal  rank  submatrix. 
ksy  is  just  the  determinant  of  this  fourth  matrix.  Since  all  its  entries  are  (4  variable)  polynomials,  the 
determinant  algorithm  in  Fermat  (there  are  several)  which  works  by  recursive  Lagrange  interpolation  is 
suitable.  It  completed  in  3  h  and  produced  a  ksy  with  around  500,000  terms.  (All  times  in  this  paper  are 
for  a  233  mHz  Macintosh  with  604e  chip.)  It  had  degree  26  or  25  in  each  of  the  four  qr  As  an  ASCII 
file,  this  ksy  occupied  a  file  of  3.5  Mb. 

To  evaluate  ksy  at  four  numerical  values  took  about  2  s,  so  this  is  feasible  in  real  time.  Extensive 
testing  with  2D  data  sets,  valid  and  invalid,  verified  the  correctness  of  ksy.  This  was  all  done  using  the 
prime  44,449.  Using  41,999  produced  essentially  the  same  results. 

Wishing  to  look  more  closely  at  ksy,  we  returned  to  the  idea  of  computing  it  by  row  reductions  on  the 
third  matrix,  over  the  field  Q,  rather  than  Zp.  The  first  nine  diagonal  pivot  elements  were  enlightening: 

<?4  -  qi,  qA  —  qi,  <?4  -  92>  92(^4  +  92),  92(94  +  92);  (93  +  9i)(94  ~  92)1 
(93  ~  9iX94  _  92)?  (93  —  2q\){qA  —  q2),  (93  —  l/2#i  +  1/6) (^4  —  ^)- 

This  suggests,  but  does  not  prove,  that  ksy  has  many  simple  factors.  After  much  testing  Lewis 
verified  that: 

94(94  -  92)4(93  -  9i)492(93  -  I/291  +  1/6)  (2) 

is  a  factor.  One  of  the  Fermat  determinant  algorithms  can  take  advantage  of  a  known  factor.  It  then 
computed  the  rest  of  ksy  (the  other  factor)  in  only  25  min,  down  from  the  original  3  h.  This  “reduced 
ksy ”  has  100,000  terms  and  occupies  only  670  K  of  disk  space.  Numerical  tests  show  that  the  actual 
resultant  is  indeed  a  factor  of  the  reduced  ksy. 

Even  more  extraneous  factors  can  be  removed  from  the  reduced  ksy.  First,  since  the  resultant  must  be 
irreducible,  we  may  divide  out  all  the  contents  of  ksy.  Secondly,  with  different  maximal  rank 
submatrices,  simple  variations  of  (1)  divide  their  determinants  (and  this  remains  true  for  different 
choices  of  3D  invariants,  not  just  the  values  used  here,  3,  4,  2,  3,  2,  3,  1,  2,  1/2).  Thus,  it  is  not  hard  to 
compute  another  reduced  determinant  ksy',  and  the  true  resultant  should  be  a  factor  of  GCD{ksy,  ksy'). 
We  have  therefore  the  following  algorithm: 

res:— ksy, 

REPEAT 

Compute  new  reduced  ksy  using  a  new  maximal  rank  submatrix; 
ksy:=ksy/a\\  contents(fay); 
res:=GCD(res,  ksy) 

UNTIL  DONE 
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After  five  repetitions  of  this  loop,  the  polynomial  res  contained  only  300  terms!  It  was  small  enough 
to  be  factored  with  standard  algorithms.  The  factor  that  vanishes  on  a  known  2D  data  set  is: 

sixline  =  q]q2A  -  2qxq\  +  %q\  +  6q2q2q4  +  I2qxq2q4  -  60q2q4  -  2qxq2q4  +  2q]q4  +  28 qxq4 
~  ■+  8?3  -  2q\q2  -  Uqxq2q2  +  60<?2<?3  ~  16?i<?3  “  %<l\  ~  2%qxq2  +  8 q\. 

This  was  all  done  over  a  finite  field,  Zp.  But  the  coefficients  above  are  suggestively  small  integers. 
Indeed,  this  is  the  actual  resultant  over  Q,  not  just  over  Zp.  That  is  easy  to  prove:  recall  that  each  q, 
may  be  solved  for  in  <i,  =  0,  then  just  substitute  into  sixline  each  qt  with  its  formula  in  terms  of  the 
other  variables.  The  expression  evaluates  to  0.  It  is  as  if  we  had  set  out  to  use  the  Chinese  Remainder 
Theorem  to  find  the  resultant  over  Q,  and  discovered  that  one  prime  was  enough. 

In  summary,  the  polynomial  sixline  provides  the  solution  to  the  problem  for  the  given  particular  3D 
data  set.  If  any  set  of  2D  invariants  be  presented  in  the  future,  plug  them  into  sixline.  If  the  result  is  not 
0,  then  they  do  not  represent  a  perspective  projection  of  the  original  object. 

Now,  our  entire  method,  which  we  know  has  worked  because  sixline  is  verifiably  correct,  is  based  on 
the  Kapur-Saxena-Yang  idea  of  computing  the  determinant  of  a  maximal  rank  submatrix.  In  [2]  they 
show  that  the  resultant  must  be  a  factor  of  any  such  determinant,  provided  that  a  certain  condition 
holds.  This  (sufficient)  condition  is  that  some  column  in  the  105x105  second  matrix  be  linearly 
independent  of  all  the  others.  However,  in  our  case  the  condition  fails!  Yet  the  method  works  anyway. 

It  may  be  asked  why  it  was  necessary  to  produce  the  polynomial  sixline  at  all.  Instead,  one  could 
simply  take  a  candidate  set  of  2D  invariants  and  plug  them  into  the  third  matrix,  whose  rank  is  known  to 
be  26.  If  the  rank  drops,  which  is  surely  a  simple  thing  to  check,  then  the  determinant  of  every  maximal 
rank  submatrix  must  vanish  on  that  2D  set. 

To  answer,  there  are  several  reasons  why  the  derivation  of  the  polynomial  sixline  is  very  desirable: 

•  It  is  not  clear  that  the  g.c.d.  of  all  the  maximal  rank  submatrices  is  exactly  the  resultant.  If  it  is  not, 
there  may  be  spurious  zeros. 

•  The  2D  invariants  {qx,  q2,  q2,  q4}  will  probably  be  obtained  by  extracting  and  measuring  lines  on 
photographs.  Tt  is  necessary  to  match  the  six  2D  lines  with  the  six  lines  on  the  original  3D  object. 
This  will  probably  require  testing  all  6! =720  possible  permutations.  The  time  saved  in  plugging  the 
{<?,}  into  sixline  versus  finding  the  rank  of  the  third  matrix  may  not  be  significant,  but  it  will  be 
multiplied  by  720. 

•  We  have  been  assuming  that  the  2D  invariants  are  known  exactly,  but  if  they  come  from  measure¬ 
ments,  there  may  be  errors.  Error  analysis  is  much  easier  if  it  is  based  on  the  polynomial  sixline. 

•  In  the  following  sections  we  generalize  our  method  to  produce  a  completely  symbolic  version  of 
sixline-,  i.e.,  we  forego  plugging  in  numerical  values  for  the  nine  3D  parameters,  and  all  13  variables 
appear  in  the  resultant. 

•  It  is  possible  to  consider  recognition  of  n  lines  by  similar  methods.  However  six  is  the  minimum  for 
the  problem  to  be  meaningful;  sets  of  five  or  fewer  lines  cannot  be  distinguished  in  this  manner. 


5.  Phase  Two 

In  Phase  One  we  substituted  numerical  values  for  all  parameters  except  qh  q2,  <?3,  q4.  Lewis  then 
redid  the  computations  keeping  various  other  subsets  of  the  parameters,  such  as  the  four  ptJ,  the  set 
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{ Aj ,  A2,  g,  nn,  /i22},  and  various  combinations  of  the  preceding  with  some  of  the  qt.  In  this  way  we 
learned  the  degree  of  the  resultant  in  all  of  its  parameters.  Each  degree  is  either  1  or  2.  We  learned  also 
that  if  we  order  the  parameters  so  that  the  four  qt  have  highest  precedence,  the  leading  term  is 
/(Ai,  A2,pn,p22>Pi2,/>2i)<7?<?4»  for  some  polynomial /in  the  indicated  parameters  only. 

6.  Phase  Three 

The  work  done  in  Phase  One  constitutes  a  viable  solution  to  the  Six-Line  Problem,  given  the  3D  data 
of  an  object.  But  we  want  to  compute  the  complete  resultant  for  all  objects,  in  all  13  parameters. 

Grosshans  et  al.  [1]  were  the  first  to  compute  this  polynomial  res,  using  invariant  theory  and 
experimenting  with  lots  of  numerical  cases,  observing  various  dependencies  among  the  variables  and 
exploiting  various  symmetries  in  the  equation.  They  found  a  res  with  239  terms.  The  final  answer  is 
quartic  in  the  3D  invariants  and  quartic  in  the  2D  invariants,  yielding  a  total  degree  8.  An  alternative 
approach  by  Stiller  and  Ma  used  interpolation  by  generating  a  large  number  of  “matching”  object 
image  pairs  and  exploiting  the  degree  bounds  predicted  by  Lewis.  How  do  we  know  this  polynomial  is 
correct?  Recall  that  each  <?,-  occurs  only  in  equation  d,-  =  0  and  can  be  solved  for,  yielding  a  rational 
expression,  for  example: 

q\  =  (<?A2<322  -  *Aifli2fl21«22  -  ”22 A2<321'322  +  ”11^2^21^22  +  2gA20i2022 

— gAjfl^O^  —  niln22^2a22  +  ”11^2^22  +  S^^2a22  +  ”22Al0|202,022  —  ”11  A]42i2^21a22 
~g  A2fl21a22  —  ^10)2^21^22  +  ”  1 1  ”22  ^2<*  1 2^2 1  ^22 

— 2/122 A20 12^21^22  +  ”1 1  A2<3l2«21<322  ~  g2 ^2^X2(12^22  +  1  «22 -^1 « 1 2^2 1  <^22 

+”22  Al  O)2<221022  —  2”ll  A]ai2021a22  —  g2AlO]2fl2l£22  —  ?A2fl2l”22 
+gX2^2\2a22  ~  g^\a2\2a22  ~  ”l  l”22A2ai2022  +  ”ll  A2Ol2^22  +  g2 ^2^\2a22 
+”l  1  ^*22 A 1  £2 ] 2^22  ~  ”l  1 A1O12O22  —  g2^l&l2a22  +  g^lOl2a2\  ~  ”ll”22A]0^2fl2] 

+”22AiOi2fl2i  +S2^\a2\2ali  -  8*2a\2a2\  +  2gAiOi2021  +  ”n”22A2tf?2a2i 

—ri22^2^2\2a2\  —  ^2A2fl[2^21  ~  ”l l”22Al02202l  +  ”22Al0^2fl21  +  g2^\G\2a2\ 

~  gA20l2”21  +5Al<2l2”2l)/(gA20l2fl21^22  —  g^lal2^21^22  ~  n22^202\^22  +  ”22Ai02io|2 

+  gA[  A2ai2/222  —  <?A|Ol2<222  _  ”22 A 1  A2022  +  ”22 A.  1  <a|2 

4-/11^201201,022  “  ”llAiOi2a2la22  —  gA 202\022  +  gAi02,022 
-gAiA20^2«2l”22  +  2gA2a^2a2lfl22  —  gAi0^2a2l022  +  ”22  A 1  A20 1 202 1 022 
-I-/I11A1  A20i202|022  —  2/I22A2O12O21O22  +  ”l  1 A2O12O21 022  +  ”22Al0i202l022 
— 2/lnAiai2a2i022  —  gA[  A2O21O22  —  gA202]022  +  2gAi02i022 
+gA]A20^2fl22  ~  gAi0,2022  —  /I22A1 A2O12O22  +  ”11 A1A2O12O22 
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2  2 

+n22\\a\2(i22  —  n\\X\a\2a22  -  gX\X2azi  +  gX\a22  -  n\\*\A2&\2a2\ 

+«llA 2<3i2a21  +  gM^2ai2<*2\  ~  g^2<*  12^21  ~  g^\^2o\2a2l  +  g^2^\2a2\ 

+n22^\^202i2a2l  -  ni\X\X2a2l2a2\  —  «22^2<312<321  +  nil^20\2a2\ 

+gX\X2a\202\  —  g^20\202l) 

Lewis  simply  substituted  for  each  qt  its  expression  as  above  into  res -and  checked  that  the  result  is 
identically  (symbolically)  0.  (200  meg  of  RAM,  1 1  min,  using  Fermat.  No  other  computer  algebra 
system  that  we  are  aware  of  could  do  this  computation.) 

We  felt  strongly  that  the  Dixon-KSY  method  ought  to  work  as  well  to  compute  res.  But  recall  that 
even  after  plugging  in  integers  for  9  of  the  13  parameters,  the  KSY  method  produced  a  500,000  term 
answer,  almost  all  of  which  were  spurious  factors.  Brute  force  is  therefore  rejected.  Several  ideas  led 
eventually  to  the  solution. 

The  first  idea,  due  to  George  Nakos,  is  as  follows.  Instead  of  applying  the  KSY  method  to  four 
equations  { d ,•  =  0}  to  eliminate  the  three  variables  {aX2,  fl2i»  ^22}.  do  it  in  stages: 

•  Apply  KSY  to  [dx,  d2,  d3)  eliminating  two  variables,  obtaining  a  polynomial  yx  that  still  has  ai2. 

•  Apply  KSY  to  {d2,  d2,  d4)  eliminating  two  variables,  obtaining  a  polynomial  y2  that  still  has  ai2. 

•  Apply  KSY  to  [yu  y2 )  to  get  the  final  res. 

However,  it  is  not  that  easy.  Each  y,  would  have  had  many  millions  of  terms,  making  the  third  step 
hopeless.  Lewis  applied  two  fairly  standard  ideas  to  reduce  the  size  of  each  yf. 

•  Interpolation :  Plug  in  authentic  3D  values  for  some  of  the  parameters.  Run  the  above  three  steps  with 
enough  such  sets  Of  values,  then  construct  res  with  standard  interpolation  techniques. 

•  Quotient  ring:  We  know  that  the  final  answer  res  is  of  low  degree  in  each  parameter;  for  example,  it 
is  degree  2  in  g  and  degree  1  in  nn.  Analogously  to  working  over  Zp  instead  of  Z,  we  could  work 
modulo  a  cubic  polynomial  in  g  and  a  quadratic  polynomial  in  nn.  This  eliminates  high  degree  (in  g 
and  nn)  intermediate  results  while,  ideally,  not  changing  the  final  answer.  Fermat  allows  one  to  work 
easily  and  efficiently  over  such  fields. 

While  either  technique  alone  might  have  sufficed,  we  decided  to  use  both.  There  is  a  problem, 
however,  with  the  second  technique,  the  well  known  leading  coefficient  problem.  Suppose  R  is  a 
polynomial  ring,  say  R  =  F[a,  b,c,.. .].  Let  IcR  be  an  ideal  such  that  RII  is  a  field.  We  wish  to  compute 
in  (R/I)[x,  y,  z, . . .]  instead  of  R[x,  y,  z, . . .].  When  working  over  such  a  quotient  field,  algorithms  such 
as  polynomial  g.c.d.  dispense  with  leading  coefficients  involving  the  field  variables  a,  b,  c,  . . ..  The 
leading  coefficients  are  divided  through  to  produce  “pseudo-monic”  polynomials.  This  makes 
reconstruction  of  the  actual  answer  in  J?[x,  y,  z,  . . .]  problematic.  But  due  to  the  work  accomplished  in 
Phase  Two,  we  know  that  the  leading  term  relative  to  the  qt  is  f{X\ ,  A2.p1 1  .P22.P12.P21  for  some 
polynomial/in  the  indicated  parameters.  Therefore,  by  choosing  to  mod  out  by  g  and  nn  we  avoid  this 
problem.  (We  could  mod  out  by  n22  in  addition,  but  that  greatly  slows  down  the  computations  in 
Fermat.) 

In  summary,  we  chose  to  work  modulo  g3— 3  and  n^  —  7,  and  over  the  prime  p=  17041. 
2 p[g,  ni  i]/(g3  -  3,  n2u  -  1)  is  a  field.  We  interpolated  for  Xu  X2  and  n22.  We  know  from  Phase  Two  that 
the  answer  is  of  degree  1  in  each  of  the  latter  parameters,  so  we  need  to  run  the  three  steps  eight  times. 
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However,  it  still  does  not  work.  >'i  and  y2  each  have  about  300,000  terms  and,  worse  yet,  are  of 
high  degree  (>30)  in  al2.  That  makes  the  third  step  unworkable.  The  problem  is  solved  by  recalling 
from  Phase  One  the  idea  of  dividing  out  by  the  contents.  Compute  (12  min).  Then  compute  all 
its  contents  and  divide  out  by  them  (123  min;  132  meg  RAM).  The  result  has  only  90  terms!  Repeat 
for  y2.  Then  do  the  third  step  (about  1  min).  This  produces  a  preliminary  answer  with  a  set  of  values 
plugged  in  for  Aj,  A2  and  n2 2.  We  then  repeat  seven  times  and  interpolate  for  the  final  answer.  In  doing 
so,  one  final  problem  arises.  Because  the  contents  were  divided  out  often,  the  eight  preliminary  answers 
may  be  missing  leading  numerical  coefficients  -  another  incarnation  of  the  leading  coefficient 
problem.  Especially  likely  is  that  one  or  more  needs  to  be  multiplied  by  —1.  Since  we  know  that 
the  final  answer  is  a  polynomial  with  integer  coefficients,  it  is  easy  to  experiment  and  compute  the 
right  answer. 


7.  Conclusion 

Elimination  in  stages  using  the  Cayley-Dixon-Kapur-Saxena-Yang  method  succeeded  for  two 
reasons: 

1 .  The  final  answer  is  of  low  degree  in  most  of  its  variables  (in  fact,  all  of  them). 

2.  At  each  stage,  polynomials  are  produced  that  are  multiples  of  the  resultant,  with  huge  spurious 
factors.  But  the  resultant  is  the  only  factor  involving  all  the  variables.  It  can  therefore  be  found  by 
dividing  out  all  the  contents. 

Unless  there  is  something  very  special  about  the  equations  that  came  up  in  this  problem,  it  is 
reasonable  to  conjecture  that  our  successive  elimination  method  with  KSY  may  be  applicable  to  other 
large  polynomial  systems. 
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Appendix  A 

A.  The  four  equations 


•  Nine  3D  parameters:  Aj,  A2,  n„,  n22,  g,  pn,  p:2,  p2 1,  Pn- 

•  Four  2D  parameters:  q\,  q2,  q2,  q^. 

•  Four  conversion  variables  (later  we  set  un=l):  au,  ai2,  a2t,  a22. 
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•  Four  equations  d\,  d2 ,  d 3,  d 4  in  the  three  variables  a,,  and  the  13  parameters.  Note  that  g,  appears  only 
in  di  and  only  with  exponent  1. 

d\  =  (ga2u  +  ga\ici2\  +ni2a\\ayi  +  011022^22  —  >*11011012  —  0i202i>*ii 

^12fl22)(“ ^2^12^21  “  ^2^21^22  +  ^\alla22  +  ^1^21^22  “  ^^2^11^22 

+AiA2<2i2^2l)9l  “  {-g<l\\a 21  “  £021  “>*22012021  —  ^22^21^22  +  ^11^11^22 

+>*11021022  +  £012^22  +  £022  “  fll  la22>*l  1>*22  +  011022£2  +  012021  >*1 1  >*22 

— 01202l£2)(A2011012  +  011022 A2  “  Ai^n012  “  01202lAi), 

d2  =  (£0JJ  +  £011021  +  ^22011012  +  011022^22  ”l  1011^12  "  012021>*11 

-£0 12  “  £012022) (01 1022  -012021  -0H022A2  +  012021  Ai  )<?2  “  (011022  -012021  “£011021 
“01 1022^22  +  012021^11  +  £012022)(A2011012  +  011022A2  “  ^\Ct\\Cl\2  “  012021  Ai), 
d'l  =  (/?1202i  +  P12011021  +  P22alla\2  +  011022^22  “Pi  1011012  “  01202lPll 
— P210^2  —  P21012022)(— A2012021  “  ^2^21^22  +  ^lan&22  +  Aj<221022  “  AiA2<3il022 
+Ai  A2fll202l)?3  “  (“ P12011021  “* P12021  “  P22012021  “ P22^2\^22  +  P11011022 

+P11021022  + P2 10 12022  +  P2\^22  “  a\\a22P\\P22  +  01  \^22P\2P2\  +  01202lPl lP22 
“ 01202lPl2P2l)(A2011012  +  01 1022 A2  —  A]  11^12  “  012021  Ai), 
d*  —  (pi202 1  +P1201102I  +P22011012  +  0 1 1 022P22  “ P11011012  “  01202lPll 
“ P210J2  “  P21012022)(011022  “  012021  “  011022A2  +  012021  Ai)<?4  “  (011022 
—012021  —  P12011021  —  011022P22  +  012021P11  +  P2\^\2^22){^2^\\^\2  +  a\\a22^2 

— Ai<2uai2  -  012021  Ai). 


References 

[1]  F.  Grosshans,  R.  Gleason,  R.  Williams,  M.  Hirsch,  August  1997,  personal  communication. 

[2]  D.  Kapur,  T.  Saxena,  L.  Yang,  Algebraic  and  geometric  reasoning  using  Dixon  resultants,  Proceedings  of  the 
International  Symposium  on  Symbolic  and  Algebraic  Computation,  1994,  ACM  Press,  New  York. 

[3]  R.H.  Lewis,  Computer  algebra  system  Fermat,  http:  llw ww.bway.net/~lewis/,  http://www.fordham.edu/lewis/,  1998. 

[4]  L.  Quan,  Computation  of  the  invariants  of  a  point  set  in  P3  from  its  projections  in  P2>  in:  N.L.  White  (Ed.),  Invariant 
Methods  in  Discrete  and  Computational  Geometry,  1994,  pp.  223-244. 

[5]  P.  Stiller,  Symbolic  computation  of  object/image  equations.  Proceedings  of  the  International  Symposium  on  Symbolic 
and  Algebraic  Computation,  1997,  ACM  Press,  New  York,  pp.  359-364. 

[6]  P.  Stiller,  C.  Asmuth,  C.  Wan,  Single-view  recognition  -  the  perspective  case,  Proceedings  SPIE  International 
Conference,  Vision  Geometry  V  2826,  vol.  8,  1996,  pp.  226-235. 


Geometric  Hashing  and  Object  Recognition 

Peter  Stiller  and  Birkett  Huber 

Institute  for  Scientific  Computation 
Texas  A&M  University 
College  Station,  TX  77843-3404  USA 


ABSTRACT 

We  discuss  a  new  geometric  hashing  method  for  searching  large  databases  of  2D  images  (or  3D  objects)  to  match  a 
query  built  from  geometric  information  presented  by  a  single  3D  object  (or  single  2D  image).  The  goal  is  to  rapidly 
determine  a  small  subset  of  the  images  that  potentially  contain  a  view  of  the  given  object  (or  a  small  set  of  objects 
that  potentially  match  the  item  in  the  image).  Since  this  must  be  accomplished  independent  of  the  pose  of  the 
object  the  objects  and  images,  which  are  characterized  by  configurations  of  geometric  features  such  as  points,  lines 
and/or  conics,  must  be  treated  using  a  viewpoint  invariant  formulation.  We  are  therefore  forced  to  characterize  these 
configurations  in  terms  of  their  3D  and  2D  geometric  invariants.  The  crucial  relationship  between  the  3D  geometry 
and  its  “residual”  in  2D  is  expressible  as  a  correspondence  (in  the  sense  of  algebraic  geometry).  Computing  a  set  of 
generating  equations  for  the  ideal  of  this  correspondence  gives  a  complete  characterization  of  the  view  independent 
relationships  between  an  object  and  all  of  its  possible  images.  Once  a  set  of  generators  is  in  hand,  it  can  be  used  to 
devise  efficient  recognition  algorithms  and  to  give  an  efficient  geometric  hashing  scheme.  This  requires  exploiting  the 
form  and  symmetry  of  the  equations.  The  result  is  a  multidimensional  access  scheme  whose  efficiency  we  examine. 
Several  potential  directions  for-improving  this  scheme  are  also  discussed.  Finally,  in  a  brief  appendix,  we  discuss  an 
alternative  approach  to  invariants  for  generalized  perspective  that  replaces  the  standard  invariants  by  a  subvariety 
of  a  Grassmannian.  The  advantage  of  this  is  that  one  can  circumvent  many  annoying  general  position  assumptions 
and  arrive  at  invariant  equations  (in  the  Pliicker  coordinates)  that  are  more  numerically  robust  in  applications. 

1.  INTRODUCTION 

Content-based  retrieval  of  information  from  large  databases  that  keys  on  visual/geometric  information  contained  in 
images,  schematics,  design  drawings,  and  geometric  models  of  environments,  mechanical  parts  or  molecules,  etc.,  will 
play  an  increasingly  important  role  in  future  distributed  information  and  knowledge  systems.  This  paper  focuses 
on  two  aspects  of  geometric  content-based  retrieval  for  knowledge  acquisition.  The  first  concerns  geometric  hashing 
techniques  for  matching  geometric  configurations  of  features  in  a  database  of  3D  objects  to  a  geometric  configuration 
of  features  in  a  single  2D  image  or,  vice  versa,  matching  geometric  configurations  of  features  in  a  database  of  2D 
images  to  a  geometric  configuration  of  features  on  a  single  3D  object.  The  second,  dealt  with  in  an  appendix, 
describes  an  alternative  to  classical  invariants  that  associates  to  a  particular  geometric  configuration  an  invariant 
subvariety  of  a  Grassmannian.  Equations  for  this  subvariety  in  the  Pliicker  coordinates  of  the  Grassmannian  serve 
as  “invariants”  of  the  configuration.  The  advantage  of  these  “invariants”  is  that  one  can  circumvent  many  annoying 
general  position  assumptions;  resulting  in  more  numerically  robust  versions  of  the  object/image  equations  used  to 
match  3D  and  2D  configurations  of  features. 


2.  THE  SET-UP 

In1’2  we  describe  a  technique  for  using  geometric  information  contained  in  2-D  images  to  search  large  databases  of 
3-D  models  in  the  special  case  where  the  geometric  information  consists  of  finite  point  configurations.  This  technique 
exploits  certain  polynomial  relations  known  as  object/image  equations  between  invariants  assigned  to  the  2-D  an 
3-D  feature  sets.  The  resulting  scheme  is  independent  of  changes  in  scale  and  perspective.  Here,  we  descri  e  a 
technique  for  constructing  a  hashing  scheme  based  on  this  technology. 

Recall  that  the  object/image  equations  are  polynomial  relations  in  the  combined  set  of  geometric  invariants 
associated  to  a  3D  configuration  and  those  associated  to  a  configuration  in  a  2D  image.  They  completely  descri 
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the  mutual  3D/2D  constraints234,  and  can  be  used  in  a  number  of  ways.  For  example,  from  a  given  2D  configuration 
one  can  determine  a  set  of  non-linear  polynomial  constraints  on  the  geometric  invariants  of  those  3D  configurations 
capable  of  producing  the  given  2D  configuration  in  an  image.  This  results  in  a  test  for  limiting  the  possibilities  for 
the  object  is  being  viewed.  When  applied  to  searching  a  database  of  objects  to  find  likely  matches  to  a  given  image, 
this  test  can  then  be  used  as  a  filter  to  remove  objects  from  further  consideration,  leading  to  at  a  greatly  reduced  set 
of  objects  which  must  be  more  carefully  considered.  While  the  object/image  formalism  applies  to  much  more  general 
feature  sets,5  the  case  of  point  configurations  is  especially  simple.  Its  performance  for  single  view  recognition  of 
point  configurations  was  investigated  in.1  In  this  paper  we  describe  the  use  of  these  equations  for  setting  up  an 
index  (hashing  scheme)into  a  large  data  base  of  3D  point  configurations  for  rapid  query  by  2D  images.  In  order  to 
rule  out  groups  of  objects  the  index  uses  the  object/image  equations  to  determine  if  any  object  in  a  region  of  the 
“invariants  space”  for  objects  matches  a  giver,  image. 

We  will  work  in  the  general  perspective  case.  To  facilitate  this,  ordinary  points  (x,y,  z)  €  JR3  or  (u,v)  €  JR2  are 
represented  in  homogeneous  coordinates  in  the  respective  projective  space  2P3  or  projective  plane  P2: 


/  x\ 


y 

z 


6  P3 


VI/ 


or 


€  P2 


Image  formation  is  accomplished  by  a  general  perspective  projection  M,  which  is  a  3  x  4  matrix  of  rank  3.  Viewed 
as  a  rational  map  from  P3  to  P2,  M  takes  (x,y,  2)  to  (ti,v)  where  u  =  ^  and  v  =  ^  and 
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This  transformation  is  undefined  at  one  point  (a  :  (3  :  7  :  <S)  €  P3,  which  is  the  focal  point  of  the  projection  (the  null 
space  of  M).  A  typical  example  is 

/  1  0  0  0  \ 

0  10  0 

V  o  o  1  1  / 


which  takes  (x,  y,  2)  to  (u,t/)  where 


x  y 

-  v  — - 

2+1  2+1 


and  which  is  undefined  at  (0  :  0  :  -1  :  1).  This  is  perspective  projection  onto  the  xy- plane  through  the  point 
(0,0,  -  I).  Such  projections  are  examples  of  the  standard  “pinhole  camera”  model  of  image  formation.6 

In  general,  we  will  have  a  parameter  space  of  point  configurations  in  3D  or  2D.  These  features  will  be  subject 
to  general  projective  transformations  (the  group  PGL4  in  space  or  PGL3  in  the  plane)  resulting  in  a  notion  of 
equivalent  configurations.  This  physically  reflects  a  “change  of  point  of  view”  in  P3  or  in  P2.  The  basic  invariants 
we  construct  are  functions  of  the  configurations  that  remain  unchanged  under  a  projective  transformation.  They 
provide  coordinates  on  a  portion  of  the  space  of  equivalence  classes  of  configurations.  In  practice,  one  must  deal 
with  several  of  these  “coordinate  patches” .  (To  avoid  “coordinate  patches”  by  using  more  “global”  invariants,  see 
the  appendix.) 


3.  THE  OBJECT-IMAGE  EQUATIONS 

With  the  3D  and  2D  invariants  in  hand,  we  can  construct  the  basic  equations  between  them  that  reflect  the  com¬ 
plete  set  of  view  independent  constraints  that  must  hold  between  3D  configurations  and  their  images  in  2D  under 
perspective  projection.  It  will  be  convenient  to  represent  configurations  of  m  points  in  Pd  by  (d  +  1)  x  m  matrices 
A  =  [Pi | . . .  |Pm],  whose  columns  are  homogeneous  coordinates  for  the  points  P,.  As  homogeneous  coordinates  are 
defined  only  up  to  scalar  multiples,  we  can  multiply  A  on  the  right  by  an  arbitrary  invertible  mxm  diagonal  matrix 
without  changing  the  point  configuration  it  represents.  We  also  see  that  multiplying  on  the  left  by  an  invertible 
{d  +  1)  x  (d  +  1)  matrix  has  the  effect  of  applying  a  common  projective  transformation  to  the  points  of  A.  Thus  two 
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(d+  1)  x  m  matrices  represent  equivalent  objects  if  we  can  convert  one  to  the  other  by  elementary  row  operations  and 
non-zero  column  scaling.  Using  a  slightly  modified  version  of  Gaussian  elimination,  we  can  compute  a  fundamental 
set  of  invariants  for  a  matrix  representing  m  points  in  space  (i.e.  in  P3)  by  moving  the  first  5  points  to  a  “standard 
position.” 
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Similarly  for  points  in  P2  we  move  the  first  4  points  to  standard  position: 
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In  effect  we  are  moving  Pi  to  the  point  in  projective  space  with  homogeneous  coordinates  [1  :  0  :  0  :  0],  P%  to 
[0  :  1  :  0  :  0]  and  so  on,  with  P5  moving  to  [l  :  1  :  1  :  1].  This  presupposes  that  Pi,  P2, P3, Pa  are  not  co-planar  and 
that  P5  is  not  co-planar  with  any  three  of  Pi,  P2,  P3,  P4,  i.e.  we  are  assuming  that  Pi, ...» P5  are  in  general  position. 
Actual  generators  for  the  field  of  invariant  rational  functions  under  projective  transformations  on  the  moduli  space 
of  m-tuples  of  points  in  P3  are  { £1^  for  z  =  6, . . . ,  m|.  (Strictly  speaking,  to  have  a  reasonable  moduli, 
one  must  restrict  to  “stable”  or  “semi-stable”  m-tuples.9) 

These  particular  functions  provide  coordinates  on  an  open  subset  of  the  quotient  space  of  m-tuples  modulo  the 
action  of  PGL4,  the  group  of  projective  transformations  on  P3.  This  open  subset  ^...,3  is  defined  by  the  non¬ 
vanishing  of  the  coordinates  pi, 3  for  z  =  6, . . . ,  m  in  addition  to  the  general  position  assumption  on  PL, . . . ,  P5.  We 

can  similarly  define  T ^ dm  to  be  the  open  subset  defined  by  the  general  position  assumption  on  Pi, . . .  ,P5  along 

with  the  non- vanishing  of  p ^  for  z  =  6, . . .  ,m  and  analogously  arrive  at  affine  coordinates  on  each  . 

For  reasons  of  numerical  stability  and  in  order  to  avoid  imposing  any  assumptions  on  the  points  aside  from  the 
general  position  of  Pi, . . . ,  P5,  we  work  with  the  homogeneous  coordinates  pij  rather  than  with  any  particular  set  of 
affine  coordinates  (ratios  of  the  Pij), 

If  some  other  subset  of  five  points  from  among  Pi , .  - . ,  Pm  were  in  general  position,  while  P1? . . . ,  P5  were  not,  we 
would  simply  work  in  a  different  open  subset  of  the  quotient  manifold  with  the  invariant  coordinates  on  that  open 
subset;  making  use  of  the  obvious  change  of  coordinates  to  adjust  our  formulas  below. 

The  object-image  equations  for  this  case  were  derived  for  the  affine  coordinates  above  in,4  and  reformulated  for 
homogeneous  coordinates  in.1 

Theorem  3.1.  are  the  images  o/Pi,...,Pm  under  a  perspective  transformation  then  the  relations 

Elj  =  0  and  E2q t*  =  0  hold  for  j  =  6, . . . ,  m  and  k  =  7, . . . ,  m,  where 

Elj  -  (“Pi,0Pi,l  +  Pj.OPjf  ,3  )  95  ,0  9j,2  + 

(“Pj,0Pj,3  +  Pi,2Pi,o)9i,l95,0  + 

(  Pi,2Pi,l  +Pj,3Pi,l)<?5,l9j.O  + 

(Pj\2Pjf,l  “  Pi, 2Pj, 3 )9i, 095,2  + 

(“Pi,3Pj.l  +Pi,OPi,l)<75,l<?j,2  + 

(-Pi,2Pi,0  +Pi,2Pi,3)9i,l<?5,2 

E26'k  =  (“PfctlP6,0  +  Pfc,3P6.o)<7s.O<76,2^,2  + 

(“Pit,  3P6, 0  +Pfc,2P6, o)95,096, 2Qk,l  + 

(“Pit,2P6,3  +  PJc,3P6,2)<?5,2tf6,0<?;U  + 

(PMP6,0  “  PJfc,lP6.3)<?6,2<75,l<7ic,2  + 

(Pfc.2p6,3  “  Pfc,2P6,o)<75.2<?6.29fc,l  + 

(PMP6,3  “  PiUP6,2)g6,0<75,i<?Jt,2  + 
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(PMP6, 2  -pjfc,3P6,2)$5,2<?6,0<?fc,2 


Furthermore  for  any  k  >  7  tue  can  interchange  the  roles  of  the  indices  k  and  6  tn  E2q^  (i.e.  use  E2k$  instead 
E2^k).  Since  these  equations  are  all  of  a  fixed  size,  we  can  test  the  object/image  correspondence  in  time  linear  in 
in.  Looking  carefully  at  them  one  sees  that  no  variable  pj,t  or  Qj.i  appears  in  any  term  with  degree  higher  than  1. 
This  fact  will  be  very  useful  when  setting  up  our  indexing  schemes  because  it  allows  us  to  determine  if  the  equations 
can  have  solutions  in  coordinate  boxes: 

PROPOSITION  3.2.  If  F  is  a  real  valued  multi-linear  function  in  the  variables  X\ and  B %  :=  {z  €  iRn|o,  < 
Zi<  fa,  i  ss  1 . . .  ,n},  then  F  achieves  its  extreme  values  over  Bf  at  the  vertices  of  B%. 

COROLLARY  3.3.  F(x)  is  non-zero  for  all  x  in  B%  if  and  only  if  F  is  positive  on  all  vertices  of  B%  or  negative  on 
all  vertices  of  B%. 

Once  the  projective  invariants  (the  qij)  for  an  image  are  specified  and  the  object  invariants  are  treated  as  affine 

coordinates  on  a  particular  T ^ the  polynomials  (dehomogenized  with  respect  to  the  pidi  i  =  6, . . . ,  m)  become 

multi-linear  functions  of  those  3(m  -  5)  object  invariants.  Naively  this  tells  us  that  to  check  if  an  equation  Elj  or 
E2jik  vanishes  in  a  box  in  “object  space”  we  need  23m“15  evaluations.  But  looking  carefully  at  the  equations  Elj 
we  see  that  they  each  involve  only  the  homogeneous  coordinates  for  Pj ,  and  thus  in  any  of  our  affine  charts  involve 
only  three  variables.  Since  £26,fc  and  E2k,e  involve  only  the  homogeneous  coordinates  p6,t  and  pk,i  they  can  involve 
at  most  6  variables  in  any  affine  chart.  In  fact,  not  all  of  the  p6,i  and  pk>i  appear,  and  we  can  show  that  in  any  of 
the  affine  charts  at  least  one  of  these*  two  equations  will  involve  5  or  fewer  variables: 

Theorem  3.4.  Given  known  projective  invariants  for  an  image  configuration  of  points  and  a  box  Bf  in  some  affine 
chart  of  object  space ,  we  can  determine  if  Elj  vanishes  in  Bf  with  8  =  23  evaluations.  For  one  of  the  two  equations 
E26,k  and  E2k$  this  question  can  be  answered  with  at  most  32  =  25  evaluations. 

Note  that  the  Elj  and  E2^k  are  homogeneous  polynomials  in  the  homogeneous  coordinates.  Thus  while  it  does 
make  sense  to  talk  about  them  vanishing,  their  values  are  not  defined  when  they  don’t  vanish.  In  order  to  use 
the  equations  to  measure  how  close  an  object  and  image  come  to  matching,  one  must  resolve  this  ambiguity  by 
specifying  a  way  to  dehomogenize.  In  order  to  be  able  to  use  Theorem  3.4,  we  scale  the  “projective  invariant”  points 
(i.e.  columns  6  through  m  of  the  “standard  position”  matrix)  for  the  object  so  that  all  coordinates  have  absolute 
value  less  than  or  equal  to  one,  and  at  least  one  coordinate  in  each  column  is  equal  to  one.  For  the  image  invariants 
we  are  free  to  choose  any  particular  normalization  we  please.  In,1  we  investigated  the  performance  of  several  choices 
of  normalization  in  the  presence  of  noise. 

4.  INDEXING/HASHING 

In  this  section  we  will  describe  an  indexing  scheme  based  on  the  object-image  formalism  and  then  describe  its 
implementation  in  the  case  of  point  configurations  as  described  above.  Here  we  are  assuming  a  large  data  base  of  3D 
models  to  be  queried  by  a  2D  image  for  all  objects  which  match  the  image.  For  generality,  we  describe  the  database 
as  storing  points  in  some  data-space ,  and  querying  by  proximity  to  a  query-loci.  Two  operations  are  supported: 
adding  points  (objects)  to  the  data  base  and  querying  it. 

The  basic  idea  is  that  will  we  represent  the  objects  in  our  data  base  as  points  in  the  appropriate  space,  and  use 
the  object-image  equations  to  determine  if  they  match  or  approximately  match  a  query  image.  In  other  words,  we 
will  be  looking  for  all  objects  on  or  near  an  algebraic  subset  of  the  object  space  defined  by  the  vanishing  of  the 
equations  in  the  previous  section  when  the  image  invariants  Qij  have  been  specified.  The  simplest  way  to  do  this  is 
to  simply  store  all  the  objects  and  check  the  equations  for  each  object  once  the  invariants  for  the  query  image  are 
known.  Our  approach  to  avoiding  the  necessity  of  checking  each  and  every  object  is  a  modified  multidimensional 
access  scheme.  For  a  survey  of  multidimensional  access  methods  we  refer  the  reader  to.8 

The  index  takes  the  form  of  a  tree  whose  nodes  B  (or  boxes)  consist  of  a  data-range ,  denoted  range(B),  repre¬ 
senting  a  subset  of  the  data-space  and  a  data  list  data(B).  If  B  is  a  leaf  of  the  tree,  the  data  list  will  contain  those 
data  points  in  range(B).  For  interior  nodes  B,  the  data  list  consists  of  boxes  Bi  whose  ranges  form  a  covering  of 
range(B)  by  subranges.  While  we  do  not  require  the  subranges  in  B  to  be  disjoint,  we  make  the  tacit  assumption 
that  the  overlap  is  small  and  very  few  objects  lie  in  more  than  one  box  of  a  subdivision. 
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Figure  1.  Indexing  Example 

The  data  base  begins  with  a  list  of  top-level  boxes  which  cover  the  data-space  and  is  built  up  by  adding  data- 
points  one  at  a  time.  Adding  a  point  P  to  a  box  B  proceeds  as  follows:  If  P  is  in  range(B)  then  we  add  P  to  the 
list  data(B)  if  B  is  a  leaf,  or  we  add  P  to  each  node  on  the  list  data(B)  if  B  is  interior.  Whenever  the  data  list  of  a 
leaf  becomes  too  full  it  is  made  an*  interior  node  by  subdividing  its  range  and  re-adding  it’s  data  points.  Querying 
proceeds  similarly,  one  starts  a  query  at  each  top  level  box  and  descends  to  query  children  any  time  the  query-locus 
passes  through  the  boxes  data  region. 

In  order  to  implement  this  scheme,  one  must  specify  the  data-space,  query-loci,  data-ranges  as  well  as: 

•  A  covering  of  the  data-space  by  data-ranges. 

•  An  algorithm  for  subdividing  a  range. 

•  An  algorithm  to  determine  if  a  data-point  lies  in  a  data-range. 

•  An  algorithm  to  determine  if  the  query-loci  passes  through  a  data-range. 

•  An  algorithm  to  determine  if  the  query-loci  matches  a  particular  data-point. 

One  embellishment  that  we  do  use  in  practice  is  to  maintain  two  ranges,  an  exterior  range  used  when  adding, 
points,  and  an  interior  range  used  when  searching.  The  interior  range  is  maintained  so  that  it  is  always  just  big  ; 
enough  to  hold  the  data  points  actually  in  the  box  or  in  its  subtree. 

As  a  simple  example,  we  might  take  our  data-space  to  be  composed  of  points  in  the  two  dimensional  box  [-1, 1]  )di 
[-1, 1],  our  query-regions  to  be  lines,  and  our  ranges  to  be  boxes  of  the  form  [a,  6]  x  [c,  d].  As  a  covering  we  just  use! 
the  box  [— 1, 1]  x  [-1, 1]  itself,  subdivision  cam  be  accomplished  by  dividing  a  box  with  sides  of  length  L  into  fbmj 
boxes  with  sides  of  length  ~  and,  of  course,  testing  to  see  if  a  point  lies  in  a  box  is  trivial.  As  for  querying,  if  thgi 
query  line  has  the  equation  Ax  +  By  +  C  =  0  we  use  the  magnitude  of  Ax  +  By  -b  C  as  an  indication  of  fit.  TH 
test  to  see  if  a  query-region  matches  anything  in  data-range  is  easy  if  we  recall  that  a  linear  functional  achieves  S 
maxima  at  the  vertices  of  a  box.  .M 

Our  indexing  scheme  for  object  data  bases  is  very  similar  to  the  example  above.  The  query-regions  will  be  subs? 
of  objects  matching  a  given  image.  The  data  space  is  described  by  the  4m“5  charts  mapping  subsets  of  the  fa 

. dm  °nto  a  copy  of  the  coordinate  box  ^  IR3^m~5^.  For  data-ranges  we  take  coordinate  boxes  ES 

^3m~i5  the  example,  determining  if  a  data-point  (an  object)  lies  in  a  data-range  is  trivial,  and  we  can  anf 

take  subdivision  into  23m“15  sub-boxes  by  dividing  each  edge  in  half.  Finally  Proposition  3.2  allows  us  to  chtf 
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query  locus  passes  through  a  box  efficiently  by  evaluating  the  individual  terms  Elj  and  E2qj  on  a  subset  of  the 
rertices 

The  performance  of  the  indexing  has  been  tested  and  the  results  will  be  reported  on  in  future  work. 

5.  APPENDIX:  INVARIANTS  REPRISED 

In  this  section  we  will  consider  a  more  geometric  approach  to  invariants  and  the  object/image  equations  used  in  our 
indexing  algorithm  above.  For  simplicity  we  will  consider  the  case  of  six  points  (x*,  7/j,  z{)  i  =  1, . . . ,  6  in  2R3 


( 

*1 

z 2 

*6 

y\ 

2/2 

...  ye 

Z\ 

Z2 

ze 

1 

1 

1 

arrayed  in  a  4  by  6  matrix.  Our  only  assumption  will  be  that  the  points  do  not  all  lie  in  a  plane,  or  equivalently  that 
the  rank  of  the  matrix  is  4. 

Since  we  plan  to  use  perspective  projection,  it  is  more  appropriate  to  work  in  projective  3-space  JP3  where  we 
use  homogeneous  coordinates  for  our  points.  (Hence  the  row  of  l’s  in  our  matrix.)  We  could  in  fact  carry  out  our 
discussion  starting  with  any  4x6  matrix  of  rank  4  none  of  whose  columns  are  zero 
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The  ambiguity  of  homogeneous  coordinates  can  be  captured  by 
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Each  cci  acts  as  a  scale  ?;vctor  in  the  ith  column. 

In  effect,  we  are  considering  six  points  in  P3  that  are  not  coplanar. 

The  principal  invariance  we  are  concerned 

with  is  invariance  under  the  action  of  projective  general  linear  group  PGL4  on  IP3.  These  transformations  are 
represented  by  the  action  of  4  x  4  invertible  matrices  on  the  left.  Notice  that  a  scalar  matrix  acts  trivially  in  this 
context  since  it  does  not  change  the  point  in  P3  that  each  column  represents. 

Now  viewing  M  as  a  linear  map  from  R6  to  P4,  we  have  a  kernel  (null-space)  K2  which  is  a  plane  in  R6  through 
the  origin.  Notice  that  if  we  multiply  M  on  the  right  by  a  4  x  4  invertible  matrix  A  the  new  4x6  matrix  Mf  =  AM 
stiU  has  E  as  kernel.  Thus  E  is  a  kind  of  invariant  for  M .  The  set  of  all  two  dimensional  linear  subspaces  of 
R  has  a  natural  structure  of  a  manifold,  the  Grassmanian  Gr(2, 6),  which  is  compact  and  8  dimensional.  We  can 
regard  E2  as  a  point  in  Gr(2,6). 

Now  we  are  working  with  6  points  in  P3  and  their  homogeneous  coordinates  as  the  columns  of  M,  and  so  we 
must  confront  the  problem  of  scaling  in  the  columns.  As  mentioned,  this  is  equivalent  to  acting  on  M  on  the  left  by 
D  =  D(a\, . . .  ,06)  a  diagonal  matrix.  If  oi  =  02  =  *  *  •  =  06  then  the  new  4  by  6  matrix  Mf  =  MD  has  the  same 
kernel  E2  as  M,  but  if  the  a*  are  not  all  equal  the  kernel  changes.  As  oi, . . . ,  o6  vary  we  get  a  family  V5  of  two 
dimensional  subspaces  of  P6,  namely  the  kernels  E2( Oi, . . .  ,06)  which  are  functions  of  Oi, . . .  ,06.  Vb  (actually  its 
Zariski  closure)  is  a  5-dimensional  toric  subvariety  of  Gr(2,6),  V5  C  Cr(2,6),  it  is  an  invariant  of  our  6  points  in  P3 
under  the  action  of  PGL4. 

The  fact  that  Vs  has  codimension  3  in  the  eight  dimensional  Grassmannian  is  no  accident.  Three  is  precisely  the 
number  of  independent  fundamental  invariants  for  6  points  in  P3  under  projective  linear  transformations. 
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Using  the  Pliicker  embedding  of  Gr(2,6)  in  2P14  we  can  find  equations  for  V 5  in  the  Piucker  coordinates  (the 
determinants  of  the  fifteen  4  by  4  minors  of  A/). 

Now  the  key  point.  Suppose  we  project  into  the  plane  P2  via  perspective  projection.  This  can  be  achieved  by 
premultiply  our  4  x  6  matrix  M  by  a  3  x  4  matrix  of  rank  3.  The  resulting  matrix  N  =  TM  has  columns  representing 
the  homogeneous  coordinates  of  the  images  of  our  six  points  in  some  coordinate  system  on  the  plane  P2. 

N  has  a  kernel  which  is  a  3-dimensional  subspace  H3  of  2R6.  The  key  observation  is  that,  if  if 2  is  the  kernel 
of  M  and  N  =  TM,  then  the  kernel  H3  of  N  contains  if2!  Of  course  this  is  too  simple.  The  real  requirement  is 
that  H3  contain  K2(a i, . . .  ,ae)  for  some  choice  of  the  a*,  i.e.  the  curve  in  Gr(2,6)  which  is  the  Schubert  cycle  of 
planes  through  origin  in  H3  C  P6  must  meet  V 5.  This  can  be  expressed  in  polynomial  terms  and  can  serve  as  an 
alternative  to  our  object  image  equations  above.  These  equations  will  be  global  in  the  sense  that  no  special  position 
assumptions,  beyond  the  points  not  being  coplanar  in  3D  or  collinear  in  2D,  are  required.  In  addition  this  approach 
will  be  more  robust  in  applications;  something  we  will  discuss  in  a  future  paper. 

As  a  concrete  example,  consider  a  smaller  problem.  Let’s  take  a  2  by  4  matrix  of  rank  2 

_  /  Oi  02  03  04  \ 

M  V  fci  b2  63  *>4  ) 


none  of  whose  columns  is  zero.  This  corresponds  to  choosing  4  points  in  the  projective  line  P 1 .  A  special  case  is 

/  Xj  X2  X3  X4  \ 

V  1  1  1  1  ) 

consisting  of  four  points  x*  for  i  =  1, . . . ,  4  in  Rl. 

The  Pliicker  coordinates  of  M  are  (M12  :  A/13  :  Mu  :  A/23  :  A/24  *  M34)  €  P 5  where  Mij  =  det  ^  ^  ^  The 

Piucker  relation 

M12A/34  —  A/13A/24  +  M14M23  —  0 

holds  and  cuts  out  Gr(2,4)  in  P 5. 

Now  let  K2  be  the  kernel  of  M.  The  Piucker  coordinates  of  K 2  can  be  shown  to  be  :  K\z  :  Ku  •  Kzz  • 
Ku  :  K34)  where 

if  12  =  M34  K23  =  M 14 

if  13  =  — M24  if 24  =  —  M\3 

if  14  =  A/23  if 34  —  M\2 

As  we  vary  if 2  by  multiplying  M  on  the  right  by  a  diagonal  matrix 

D(ai,a2>a3,a4)  = 


/  a!  0  0  0  \ 

0  Ck2  0  0 

0  0  03  0 

V  0  0  0  Q4  / 


we  sweep  out  the  “invariant”  variety  V3  c  Gr(2,4)  C  JP5.  In  parametric  form  V3  is  given  by 

(X12  :  Xl3  :  Xu  :  ^23  :  X24  :  X34)  = 

(0:30:4 A/34  :  —&2Q-4M2A  •  0:2 0:3 A/23  :  Oc\Ol4Mu  :  “CtiQ3A/i3  :  Ol\Ol2M\2)< 

Of  course  the  Piucker  relation 

*12*34  ~XlZX24  -XuX22 

holds.  To  describe  V"3  in  Gr(2,4)  we  need  one  additional  equation.  This  is  of  course  equivalent  to  usual  the  cross 
ratio! 
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